🌻 The factors table#

22 Sep 2025

Summary#

This extension builds a factors table from the (already filtered) links table.

The key idea is: everything starts from links, and the factors table is just an aggregation of link-derived “factor mentions”.

What exactly is a “factor mention”?#

Each link row typically contains at least:

From each link row we derive two factor-mention records:

These mention records are the atomic units that the factors table aggregates.

All upstream filters apply first. This filter does not try to re-implement link filtering; it only consumes the current link set.

For each link row \(r\):

Notes:

Step 2. Apply factor label transforms (rewrite layer)#

Before aggregating, apply the current label-rewrite transforms to m.factor, such as:

These are temporary rewrites for analysis/presentation; they do not change the underlying coding.

Step 3. Aggregate into the base factors table#

Group mention records by factor label \(f\). Compute one or more base columns (examples):

Derived role measures (examples):

The factors table is therefore an interpretation layer: it encodes choices about (a) rewrite rules, (b) evidence unit (citations vs sources), and (c) whether direction matters.

Groups: breakdown columns#

Let \(G\) be a group variable defined on sources (e.g. district, gender, section). This filter can add group breakdown columns by aggregating mention records jointly by \((factor, group)\).

Step 4. Join source metadata to mentions (for grouping)#

Join each mention record with its source attributes so each mention has \(G(m)\).

Step 5. Add the group breakdown columns#

For each factor \(f\) and each group level \(g\), compute cells such as:

Optional: also split each cell by direction (in / out) if the UI supports it.

What group columns are for#

They let you ask:

Totals and normalisations (what totals actually mean)#

Because the factors table is built from factor mentions:

Normalisation is a choice of baseline, not a cosmetic option:

The little equation below is just intuition (you can ignore it if you want).

\[ \text{share}(f,g) = \frac{\text{cell}(f,g)}{\sum_{f'} \text{cell}(f',g)} \]

This is useful when groups differ in overall verbosity or number of sources.

Optional inference: significance testing per factor (single grouping variable)#

If exactly one group variable \(G\) is selected, this filter can compute a per-factor test asking whether mentions for factor \(f\) are distributed across group levels differently than expected given group baselines.

Intuition (chi-squared style):

Even if group A has more mentions overall than group B, is factor \(f\) still over-represented in one group relative to that baseline?

For ordered groupings (e.g. age bands), an ordinal/trend framing can be more appropriate than treating levels as unordered categories.

Why this is useful#

This helps you move from “what is mentioned most?” to “what differs by context/group?”.